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(54) Method and system for correlating profile data dynamicaly generated from an optimized 
executable program with source code statements 

(57) A method and system for relating profile data 
(700) generated by monitoring the execution of an opti- 
mized machine-code computer program (800) back to 
the source-language description. (400) of the computer 
program. Logical line numbers (604) are associated 
with the basic blocks (606. 607, 608) of the intermedi- 
ate-code representation (600) of the computer program 
and actual line numbers (602) are associated with each 
instruction (610. 612. 614. 616) of the intermediate- 
code representation of the computer program. During 
optimization of the intermediate code, the logical line 
numbers remain fixed to basic blocks, while actual line 
numbers remain fixed to intermediate-code instniclic^ 
A branch instruction (610) and the target of the branch 
instruction (612) in the optimized machine-code compu- 
ter program or in an optimized assembly-language com- 
puter program (600) can be related back to source- 
language statements (400) by using the actual line 
number (602) associated with the branch instruction 
(610) and the logical line number (604) associated with 
the basic block (618) that contains the target of the 
branch instruction. 
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Description 

[0001] The present invention relates to the run-time performance analysis of computer software programs compiled 
by optimizing compilers and, in particular, to a method and system for correlating run-time profile data collected during 
5 the execution of an optimized machine-code version of a program with the source-language program from which the 
machine-code version of the program is generated. 

[0002] Compilers are programs that translate computer programs written in source languages, such as FORTRAN. 
Pascal. C, and C++, into equivalent compOed programs consisting of assembly-language instructions or machine-code 
instructions. During the compilation process, a compiler may first translate a source-language program into intermecfi- 

10 ate code and then, during later stages of the compilation process, the compiler may optimize the intermediate code to 
produce the final assembly-language or machine-code version of the source-language program. Although there is a 
fairly direct sequential correspondence between the source-language program statements and the intermediate code, 
subsequent optimizations carried out by a compiler during the later stages of compitetion may result in deletion of inter- 
mediate code instructions, duplication of intermediate code instructions, or significant alteration of the sequential order 

75 of the intermetfiate code rrtstructions. As a result, the instructions of the final assembly-language or machine-code pro- 
gram may not directly correspond to the original source-language program. Because of this lack or direct correspond- 
ence, it may be difficult or tmpossHe to select a particular assembly-language or machine-code instruction from the 
compiled program and relate that instruction back to a particular line or source-language statement in the source-lan- 
guage program. 

20 [0003] A common tool used by rjrogrammers to analyze the run-time behavior of compiled programs is to generate 
run-time profile data. This process generates the relative frequencies at which each instruction in the assembly-lan- 
guage or machine-code program is executed during execution of the program. Prof le data may be collected over mul- 
tiple executions of the program using different input data in order to generate frequencies of execution of instructions 
under a representative average of various run-time conditions. The profile data may be generated by statistically sam- 

25 pling the contents of a program counter that contains an indication of the instruction currently being executed by a com- 
puter. Alternatively, profile data may be generated by trapping the computer prior to. or after, execution of each of a 
number of different instructions and incrementing a counter for the trapped instruction. A more recent profiling tech- 
nique involves generation of profie data by dynamic translation of compiled programs, according to the teachings of US 
Patent 14a 5,815,720, Use of Dynamic Translation to Collect and Exploit Run-Time Information in an optimizing Com- 

30 pilation System." This technique avoids a compilation step to produce instrumented machine-language or assembly- 
language code. Another recent profiting technique involves the use of special hardware support included in modem 
processors for run-time branch prediction to generate profile data, as outlined in: Using Branch Handling Hardware to 
Support Profile-Driven Optimization,' Conte. Pate!, and Cox, Proceedng of the 1994 27th Annual International Sympo- 
sium on Microarchitecture. Nov.30-Dec 2 (San Jose, CA). 

35 [0004] Profile data may be used by certain optimizing compilers to perform optimizations on the profiled source 
code to produce a more efficient version of the source code. Proffle data may also be used by a rxogrammer in order 
to analyze a program, in which case the profile data may be presented as a histogram, or, when possible, the source- 
language version of the program may be annotated to indicate the relative frequencies of execution for the various 
statements within the source-language program. Whether used directly by an optimizing compiler, or used by a pro- 

40 grammer for analysis, the profie data collected from compiled program assembly-language or machine-code instruc- 
tions needs to be correlated with the source-language statements to which the assembly-language or machine-code 
instructions correspond. 

[0005] The problem of correlating profile data with a source-language program is difficult when the assembly-lan- 
guage or machine-code version of the program is generated by an optimizing compiler. In this case, as noted above, 
45 there is generally no way to correlate particular assembly-language or machine-code instructions to specific statements 
within the source-language program because of instruction duplication, elimination, or resequenctng. 
[0008] Certain prior profile-based optimization systems require that profile data be gathered using a specially 
instrumented and unoptimized version of a source-language program. However, this approach may be problematic for 
a number of reasons. First, profile data may be collected on a machine that includes only the optimized executable pro- 
se gram. On such machines, there wiD be no specially instrumented and unoptimized version of the program for profiling. 
Also, certain problems and pathological behaviors that occur during execution of an optimized assembly-language or 
machine-code program may not occur during execution of the unoptimized assembly-language or machine-code ver- 
sion of the program, ff the proffle data is being used to identify the cause of such pathological behavior, then profile data 
collected from a specially instrumented unoptimized version of the program is not useful. 
55 [0007] A need has therefore been recognized in the area of performance analysis of computer programs for an 
improved method for collecting profie data during the execution of optimized assembly-language or machine-code pro- 
grams and relating the collected proffle data back to the source-language program from which the optimized assembly- 
language or machine-code versions of the program are generated. Such a method needs to allow for coll ection of useful 
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profile data generated from running optimized assembly-language or machine-code programs, and needs to provide a 
way for profile data generated from optimized assembly-language or machine-code programs to be related back to the 
source-language programs from which the assembly-language or machine-code program is generated. 
[0008] The present invention provides a method for correlating profile data collected during execution of an opti- 

5 mized executable program back to the source-language statements within a source-language program from which the 
executable program is generated by an optimizing compiler. The optimizing compiler associates each intermediate- 
code instruction generated by the o p t imizin g compiler with an actual fine number, or, in other words, the line number of 
the source-language statement that generated the intermediate-code instruction. The optimizing compiler groups inter- 
mediate-code instructions together into basic blocks and associates each basic block with one or more logical line num- 

w bers. The logical One numbers refer to the fine number of the source-language st a tement from which the intermediate- 
code instructions within a basic block are generated. At the intermediate-code level, there is a direct correspondence 
between logical line numbers and actual Gne numbers. The optimizing compiler may eliminate, duplicate, or move inter- 
mediate-code instructions within the basic block framework generated for the intermediate-code in order to produce a 
final, optimized version of the program. The actual line numbers associated with each intermediate code-instruction are 

75 moved, eliminated, or duplicated along with the irrtermecfiate-code level instructions during the optimization process. 
However, the logical line numbers remain fixed to basic blocks. When a basic block is eliminated, the logical fine num- 
bers associated with the basic block are marked as being unexecutabla When a basic block is duplicated, then the log- 
ical line numbers associated with the basic block are duplicated. Thus, optimization can be considered to be a process 
of eliminating, duplicating, and, most commonly, rearranging intermecfiate-code instructions, along with associated 

20 actual One numbers, within a fixed framework of basic blocks having fixed logical One numbers. 

[0009] The optimiz in g compiler stores logical fine numbers and actual line numbers along with the final assembly- 
language or machine-code instructions. Bther during collection of profie data, or, after the profile data has been col- 
lected, the method of the present invention relates optimized instructions back to specific source-language statements 
by using logical fine numbers and actual line numbers associated with the optimized instructions. Generally, branch 

25 instructions are most critical for profiling the execution of a program. The method constructs, from the profile data, a net- 
work of arcs to indicate the relative frequencies at which an each branch instruction causes execution control to be 
transferred to the various possible target instructions of each branch instruction. Each arc comprises an actual fine 
number associated with a branch instruction, a logical line number associated with a target instruction of the branch 
instruction, and a relative frequency for the branching operation represented by the branch instruction and target 

30 instruction. By using the actual fine number associated with the branch instruction, the method can correlate a branch 
instruction that may have been moved or duplicated back to a particular source-language line. By using the logical line 
number associated with the target instruction, the method can correlate a target instruction that may have been moved 
or duplicated back to a particular source-language line. The present invention creates a table representing the network 
of branchAarget arcs. The contents of this table can then be used to annotate source-language programs to indicate the 

35 relative frequencies at which execution control is transferred by control statements within the source-language program 
to various source-language statements that are targets of the control statements. 

FIG. 1 shows a control flow graph that represents the intermediate-level assembly-language version of the routine 
"skewed." 

ao FIG. 2 shows a small sample array containing five scores. 

FIG. 3 shows the control flew diagram that represents the routine "skewed." 

[00101 The present invention wilt be described below in four subsections. In the first subsection, an example 
source-language routine is presented along with an unoptimized, intermediate-level, assembly-language version of the 

45 routina Then, the concept of a control flow graph comprising finked basic blocks is presented, and prof fle data is gen- 
erated from the assembly-language version of the routine and correlated with the source-language routina In the sec- 
ond subsection, an optimized assembly-language version of the routine is presented. Profile data is generated for this 
optimized version of the assembly-language routine, and the difficulty of correlating this profile data with the original 
source-language routine is demonstrated. In the third subsection, the present invention is described in terms of the 

50 example routine and the optimized assembly-language version of the example routine described in the first two subsec- 
tions. Finally, an implementation of the present invention is presented in the foulh subsection. 

Introduction and Example 

55 [0011] A short, pseudo-code routine, called "skewed," is provided below: 
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1 int skewed (int 'scores, int num) 

2 { 

3 tnt i, sum, mid, low, high, average, res; 
5 4 

5 res = -1; 

6 i = 0; 

7 sum = 0; 

w 8 while 0< num) 

9 { 

10 if(num<1) 

11 { 

12 res = 0; 
is 13 break; 

14 } 

15 if (i = 0)low = scorespl; 

16 if 0 — num - 1) high = scorespl; 

17 sum += scoresp++l; 
20 18 } 

19 tf(res<0) 



25 



20 { 

21 average = sum / num; 
^ 22 mid = (low + high) / 2; 



23 if (high > 2 * mid || high > 2 * average || 

24 high > 100 || mid > 96) res = 1; 

25 else res = 0; 



26 } 
35 27 return res; 

28 } 



40 [0012] On line 1. this routine receives, as arguments, a pointer to the first element of an array of integers, called 
"scores," and an integer "num" that represents the number of scores contained in the array "scores." This program is 
purposely written to be inefficient in order to demonstrate, below, difficulties that arise from attempting to correlate 
instructions in an optimized version of the routine with statements in the source-language routine. The scores in the 
array "scores" are assumed to be arranged in ascending order. 

45 [0013] On fine 5, the routine "skewed" assigns the value "-1" to the variable res." The variable "res" represents the 
value that will be returned by the routine "skewed." In the very inefficiently written while-loop comprising lines 8-1 8, the 
routine "skewed" first tests, on line 10, whether there are any scores in the array "scores." If there are no scores, then 
the variable "res" is assigned the value "0" on line 1 2 and execution of the whHe-\oop is discontinued by the break state- 
ment on line 13. Otherwise, if this is the first iteration of the loop, as detected by the routine "skewed" on line 15, the 

so variable "low" is assigned to the first score in the array "scores." ff this is the final iteration of the loop, as detected on 
line 16 of the routine "skewed," then the variable "high" is assigned to the final score in the array "scores." On line 1 7, 
the routine skewed adds the score represented by the current iteration of the whileloop to the summation variable 
"sum." If the variable "res" has not been assigned the value "0" on tine 12, as detected on fine 19, then the routine 
"skewed" proceeds to compute the average of the scores, on fine 21 , and the midpoint score on line 22. Then, the rou- 

55 tine "skewed" makes a comparison of the values contained in the variables "high," "mid," and "average," on lines 23-24, 
to determine whether the distribution of scores in the array "scores" is skewed If the routine "skewed" determines that 
the distrftxition is skewed, then the variable "res" is assigned the value "1" on line 34. Otherwise, the variable "res" is 
assigned the value "0" on line 25c Finally, on line 27, the routine "skewed" returns the value contained in variable "res." 
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[001 4 J The routine "skewed* rather inefficiently computes the midpoint score on One 22 by saving the first score of 
the array "scores" in the local variable low,* on tine 15, and the last score in the array "Scores" in the local variable 
"high," on fine 16. The conditional statements of lines 15 and 16 are unnecessarily repeatedly executed Airing each 
iteration of the whileAoop. The midpoint score can be more efficiently calculated in the following single statement 

5 

mid = (scores{0] +high)/2 
Also, the test on fine 10 is unnecessarily executed multple times within the while-loop. 

[00151 A pseudOHrtermediate-code version of the routine "skewed" is shown below. The instruction formats and 
10 meanings will be described in detail following the pseudo^rrtermediate-code version of the routine "skewed." It should 
be noted that the target for an instruction in this pseudCHrttermecfiate-code generally is specified as the first operand of 
the instruction, and the source or sources for an instruction generally are specified as the second and subsequent oper- 
ands. Approximate source code lines of the routine "Skewed" to which the pseudo-intermediate-code instructions cor- 
respond are provided as comments, following the comment symbol "*." Register assignments are intfcated in the 
15 comment block preceding the first instruction on fine 1. 



*rf) 


-return addre 


*ii 


-return value 


*r2 


-&(arrayPD 


*r3 


-num 


*r4 


-i 


*r5 


-sum 


*r6 


-mid 


*r7 


-low 


*r8 


-hicji 


*r9 


-average 


*r10 


-res 


*r11 




Yl2 
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1 skewed: 


sub sp, sp, #36 


* 1 


2 




store 0(sp). r4 


* 1 


3 




store 4(sp),r5 


• 1 


4 




store 8(sp), r6 


• -4 


5 




store 12(sp). r7 


* 4 


6 




. store 16(sp), r8 


• 1 


7 




store 20(sp). r9 


* 4 


8 




store 24(sp). rIO 


* A 


9 




store 28(sp), rt1 


* -4 


10 




store 32(sp). r12 


* 1 


11 




rnov rIO, #-1 


• 5 


12 




clrr4 


* 6 


13 




clrr5 


* 7 


14 


L1: 


cmp r4, r3 


* 8 


15 




bgeL5 


* 8 


16 




cmp r3, #1 


*10 


17 




bgeL2 


•10 


18 




drrlO 


*12 


19 




brL5 


*13 


20 


L2: 


cmp r4, #0 


*15 


21 




bneL3 


*15 


22 




movr11.r4 


*15 


23 




Ishftii1.#2 


"15 


24 




loadr7.r11(r2) 


*15 


25 


L3: 


movrll, r3 


•16 


26 




deciil 


•16 


27 




cmpr4, ii1 


*16 


28 




bneL4 


*16 


29 




movrll. r4 


'16 


30 




Ishftr11,#2 


•16 


31 




loadr8.r11(r2) 


*16 


32 


L4: 


movii1,r4 


•17 


33 




tstiftr11,#2 


•17 


34 




load r12, r11(r2) 


•17 


35 




add r5. r5. r12 


*17 


36 




incr4 


•17 


37 




brL1 


*18 


38 


15: 


cmp riO, #0 


*19 


39 




bgeL7 


•19 


40 




div r9, r5. r3 


*21 


41 




add r6, r7, r8 


•22 


42 




rshft r6. #1 


•22 


43 




movrll, r6 


•23 


44 




Ishnrt1,#1 


*23 


45 




cmprS, r11 


•23 


46 




bgtL6 


•23 


47 




mov ii1, r9 


•23 
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I3l III 1 1 1 , rr 1 


Zo 


AO 




mup io, nz 


ZO 


*;o 

DU 






Zo 






cmp id, # i uu 


Z*l 


OZ 




hnt 1 R 
ugl l_D 


Z4 






Unp IQ t 


Z*fr 


Oh 




Dgl LO 


Z4 


00 




mnii r4H "WO 

mov n u, #hj 


ZD 


DO 




Df L/ 


ZD 


Of 


l_D. 


mA |. ■ho 4*4 


z4 


CO 

DO 


i "7- 

L/. 


mov rf), rIO 


z/ 


59 




load r4, 0(sp) 


27 


en 
DU 




toad ro, 4(sp) 


^7 


61 




toad ro, o(sp) 


^7 


OZ 




ioao n, iz{sp) 




63 




load r8, 16(sp) 


*27 


64 




load r9, 20(sp) 


*27 


65 




load r11.24(sp) 


*27 


66 




loadii1 t 28(sp) 


*27 


67 




toadrlz^Csp) 


*27 


68 




add sp, sp v #36 


*27 


69 




jmpO 


*27 



ao [0016] The above intermediate-code version of the routine "skewed" is written in a generic pseudo-assembty-larv- 
guage. The first instruction, on line 1 of the intermediate-code version of "skewed/ includes the label "skewed" and the 
instruction "sub spsp, #36." Labels are used as symbolic addresses for target instructions of branch instructions. An 
instruction, like the above-quoted first instruction, cornpnses an operation code ("op code") and 0, 1, or more operands. 
Operands may include labels, registers, virtual registers, and literal values. In the case of the first instruction, the op 

35 code is "sub" and the three operands are register "sp," the register "sp," again, and the integer value "36." The first 
instruction is a subtraction instruction that subtracts the number "36" from the contents of register "sp" and place the 
result into register "sp." Registers are designated by a lower-case "r" followed by a number, such as register "r3," or by 
a special two-character designation such as the designation "sp" for the dedicated stack pointer register. Labels, other 
than the initial label "skewed," are designated by an "L" followed by a numeral. Finally, literal integer values are desig- 

40 nated by a "#" symbol followed by a numeric representation of the valua Instruction op codes used in the intermediate- 
level assembly code version of the routine "skewed" include: (1) "cmp," an operation that compares two values; (2) 
"bne." an operation that causes a branch to another instruction within the program specified by a label when a preced- 
ing "cmp" operation determines that the two compared values are not equal; (3) "br," an operation that causes an 
unconditional branch to a labeled instruction; (4) "mov," an operation that copies a value from a source operand to a 

45 destination operand; (5) "dec," an operation that decrements the value of a second operand and stores the result into 
the register or virtual register specified by a first operand ; (6) "Ishft," an operation that arithmetically left-shifts the value 
stored in the register speeffied by a first operand by the number of places specified by a second operand; (7) "rshft," an 
analogous operation to the previously descrbed operation "Ishft," except that the arithmetic shift is in the right hand 
direction; (8) "clr," an operation that sets the value of an operand to "0"; (9) "bge," an operation that branches to a 

so labeled instruction when a preceding compare instruction determines that the value of a second operand is greater than 
or equal to the value of a first operand; (10) "add," an operation that adds the values specified by a second and a third 
operand and places the result into a first operand; (11) "inc", an operation that increments the value stored in the first 
operand by 1 ; (1 2) "div,* an operation that divides the value specified by a second operand by the value specified by a 
third operand and stores the result in a first operand; (1 3) "bgt," an operation that branches to a labeled instruction when 

55 a previous compare instruction determines that the value of a second operand is greater than the value of a first oper- 
and; (14) "jmp" an operation that uncorxfitionaJly branches to a location specified by an operand; (15) "bge," an opera- 
tion that causes a branch to another instruction within the program specified by a label when a preceding "cmp" 
operation determines that the first compared value is identical or greater than the second compared value; (1 6) "sub," 
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an operation that subtracts the value specified by the th'rd operand from the value specified by the second operand and 
places the result into the operand; (17) "stare," an operation that stores the value specified by the second operand into 
the memory location specified by the first operand; and (1 8) Toad," an operation that loads the memory value specified 
by the second operand into the register specified by the first operand. 

[0017] The notation "rl (r2)" specifies the value of a memory location addressed by the contents of register "r2" plus 
an offset stored in register *r1 . " This indirect memory addressing is used for accessing values stored in memory. Note 
that registers "rO." 71." *r2." and "r3" have special significance that is defined by the compiler that generates the 
pseudo-intermediate-level code. Register TrO" contains the return address to which execution control is transferred 
when the routine has finished. Register "r1 " contains the return value relumed by the routine "skewed." Register "r2" 
contains the address of the first element of the array "scores," passed to the routine as the first argument and register 
"r3" contains the value "num" passed to the routine as the second argument. 

[0018] The above-described intermediate^eve^ assembly-language version of the routine "skewed" is shewn below, 
m Table 1, with horizontal dashed lines separating various groups of instructions. 



TABLE 1 

B1 

LLN: 1,5-7 Iskewed: subsp,sp,#36 ALN: 1 

2 store 0(sp) i r4 ALN: 1 

3 store 4(sp), r5 ALN: 1 

4 store 8(sp), rf> ALN: 1 

5 store 12(sp), r7 

6 store 16(sp), r8 

7 store 20(sp) t r9 

8 store 24(sp),r10 

9 store 28(sp),r11 

10 store32(sp) t r12 

11 movM0,#-1 

12 drr4 

13 drrS 



LLN: 8 14 


LI: 
15 


B2— 

cmp r4, r3 
bgeL5 

B3- 


LLN: 10 


16 
17 


cmp r3, #1 
bgeL2 
B4 



ALN: 


1 


ALN 


. 1 


ALN 


. 1 


ALN 


: 1 


ALN 


: 1 


ALN 


: 1 


ALN 


:5 


ALN 


:6 


ALN 


:7 


ALN: 8 


ALN: 8 


ALN: 10 


ALN: 10 
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LLN: 12,13 18 clrrlO ALN: 12 

19 brL5 ALN: 13 



LLN: 


15 


20 
21 


B5 

L2: cmp r4, #0 
bneL3 

B6 


LLN: 


15 


22 
23 
24 


movrll, r4 
Ishft r1 1 . #2 

load.r7.r11(r2) 
_B7 



ALN: 15 
ALN: 15 

ALN: 15 
ALN: 15 
ALN: 15 



LLN: 16 25 L3: movrll, r3 ALN: 16 

26 decrll ALN: 16 

27 cmpr4.ii1 ALN: 16 

28 bne L4 ALN: 16 
B8 

LLN: 16 29 movrll, r4 ALN: 16 

30 bhflr11.#2 ALN: 16 

31 toadr8,r11(r2) ALN: 16 
B9 

LLN: 17,18 32 L4: movr11.r4 ALN: 17 

33 Ishft M1, #2 ALN: 17 

34 loadr12,r11(r2) ALN: 17 

35 addr5,r5,r12 ALN: 17 

36 inert ALN: 17 

37 brL1 ALN: 18 
B10 

LLN: 19 38 L5: cmpr10.#0 ALN: 19 

39 bgeL7 ALN: 19 

B11 

LLN: 21.22 40 dh/r9.r5,r3 ALN: 21 

23 41 addr6,r7.r8 ALN: 22 

42 rshftr6,#1 ALN: 22 

43 movrll, r6 ALN: 23 

44 Ishflr11,#1 ALN: 23 

45 cmprB. r11 ALN: 23 

46 bgtL6 ALN: 23 
B12 — : 

LLN: 23 47 movrll, r9 ALN: 23 

48 Ishft r11,#1 ALN: 23 

49 cmprB. r12 ALN: 23 

50 bgtL6 ALN: 23 
B13 

LLN: 24 51 cmp r8, #100 ALN: 24 

52 bgl L6 ALN: 24 

B14 

LLN: 24 53 cmpr6,#96 ALN: 24 

54 bgtL6 ALN: 24 

B15 

LLN: 25 55 movrlO, #0 ALN: 25 

56 brL7 ALN: 25 
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ALN: 24 



ALN: 27 
ALN: 27 
ALN: 27 
ALN: 27 
ALN: 27 
ALN: 27 
ALN: 27 
ALN: 27 
ALN: 27 
ALN: 27 
ALN: 27 
ALN: 27 

20 

The groups of instructions separated by hori zo ntal dashed lines are called "basic blocks." and are labeled in Table 1 
with a "B" followed by a numeral. A basic block is a contiguous group of instructions that start with an instruction that 
may be the target off a branch or ju^ execution of the instructions 

of the routine. No other instruction within a basic Mock, other than the first instruction, can be the target of a branch or 

25 jump instruction. No instruction within a bask: block, other than the last instruction in the basic block, can transfer exe- 
cution ccrrtrc^ from within the bask; blotto an instnjction outside of the basic block. Thus* all the instructions in a basic 
block execute in sequence after the first instruction of the basic block is executed. For example, the "cmp" and "bge" 
instructions on fines 14-15 in Table 1 together comprise baste block "B2" because the "cmp" instruction on fine 14 is the 
target of the "br" instruction on line 37, and because the "bge" instruction on line 15 is a branch instruction. Many com- 

30 piler techniques are simplified by considering basic blocks rather than individual instructions. In essence, a basic block 
can be considered a sort of meta instruction that can either be executed or not executed, depending on the flow of con- 
trol during execution of a program. 

[0019] An important characterization of a program is a control flow graph. FIG. 1 shows a control flow graph that 
represents the intermediate-level assembly-language version of the routine "skewed." The nodes in the control flow 

35 graph of FIG. 1 correspond to basic blocks of the intermediate-level assembly-language routine shown in Table 1. The 
edges Bnking the nodes, shewn by arrows in FIG. 1. such as arrow 102, represent possfcte transfer of execution control 
by the last instruction of one basic block to the first instruction of another basic block. For example, the "bge" instruction 
on line 15 of Table 1 . the last instruction of basic block "B2." may transfer control to the "cmp" instruction on line 38, the 
first instruction of baste block "B10." However, rf the "bge" instruction on line 15 does not transfer execution control to 

40 basic block "B10," execution control is automatically transferred to the "cmp" instruction on line 1 6, the first instruction 
of basic block "83," by virtue of the sequential nature of program execution. Thus, in FIG. 1 , the node representing basic 
block "B2" 104 is shown with edges pointing to basic block "B10" 102 and to basic block "B3" 106. 
[0020] Table 1 includes one or more logical One numbers ("LLN") and actual line numbers ("ALN") associated with 
basic blocks and instructions, respectively. For example, basic block 1 is associated with LLNs 1 and 5-7. This means 

45 that the instructions of basic block 1 implement source code statements of the source-level version of the routine 
"skewed." shown above, that occur on line 1 and on fines 5-7. In Table 1, the instructions on fines 1-10 are all associated 
with ALN 1. indicating that these instructions implement the first line of the source-language version of routine 
"skewed." These instructions reserve room on the stack and save the contents of registers "r4"-"M2" onto the stack 
Line 11 in Table 1 implements the source-language statement on line 5 of the source-language version of the routine 

so "skewed" that assigns the value "-1" to the variable "res." Note that, in the intermediate code version of the routine 
"skewed," shown in Table 1 , the LLNs and ALNs directly correspond to one another. In other words, basic block "B1 " is 
associated with LLNs 1 and 5-7 and the instructions corrt 
1and5-7. 

[0021] FIG. 2 shews a small sample array containing five scores. Table 2, below, shews a histogram of the number 
55 of times each instruction in the intermediate code version of the routine "skewed," shown in Table 1, is executed when 
the routine "skewed" is executed with the array shown in FIG. 2 as input, along with the integer "5," representing the 
number of scores in the array. For example, the single symbol V in the f irst column of Table 2 indicates that instructions 
1 -13 are each executed one time when the routine "skewed" is run. The second column in Table 2 indicates that instruc- 



B16 

LLN: 24 57 L6: movr10.#1 



-B17- 



LLN: 27 58 L7: movrO.MO 

59 load r4 t 0(sp) 

60 load rS, 4(sp) 

61 load r6, 8(sp) 
10 62 load r7. 12(sp) 

63 loadr8, 16(sp) 

64 load r9. 20(sp) 

65 loadr11,24(sp) 

66 loadii1,28(sp) 
15 67 load r12, 32(sp) 

68 add sp. sp, #36 

69 jmp 0 
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tions 1 4-15 are berth executed six times when the routine "skewed" is 'executed on the array shown in RGL 2. The his- 
togram shown in Table 2, below, is indicative of the proffle data that may be collected during the analysis of the 
execution of a routine, such as the routine "skewed." In this case, a notation, represented by the symbol X" is made 
every time an instruction is executed. Commonly, a histogram will show the relative frequencies of execution of various 

5 instructions rather than the absolute count More practical profiling techniques may trap, during execution, only certain 
types of instructions and save counts representing the number of times each instruction is trapped, or may perform a 
sta ti stical analysts by interrupting execution of the program at random short intervals and noting the instruction that is 
currently being executed or that wiD next be executed by the computer. Although s t atis ti ca l profiting does not give exact 
numbers of times of execution of each instruction, it may efficiently provide quite accurate relative frequencies of exe- 

w cution of the (Efferent instructions, partiailarty when the profie data is collected over rranydrfferent executions of a sin- 
gle routine or program. 
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V13 


14-15 


16-17 


18-19 


20-21 


22-24 


25-28 


29-31 


32-37 


38-56 


57 


59-70 
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[0022] It is very straightforward to use the profile data, represented above in Table 2, to annotate the edges and 
control flow diagram, shown in FIG. 1, or to create a table of branch points within the routine "skewed" and list the fre- 
quency that each branch instruction branched during execution of the routine to each possible target of the branch 

30 instruction. RGL 3 shows the control flow diagram that represents the routine "skewed" annotated with the frequencies 
that control flows through each edge of the control flow diagram during execution of the routine "skewed" with the array 
shown in RGL 2 as input ft is easy to decide, for example, that edge 301 should be annotated to indicate that control 
flows through edge 301 only once during execution of the routine "skewed." This is because, as shown in Table 2, 
instructions 1-13 in basic block "B1" are all executed only one tima This is a trivial result, since there are no edges that 

35 point to basic block "B1 ," and, therefore, once basic block "B1" is executed, there is no way for control to flow back to it 
Simflarty, because, as shown in Table 2, instructions 38 and 39 are executed only once, we Know that edge 302 should 
be a n n ota te d with the number "1." 

[0023] Since instructions 14 and 15 are executed six times, as shown in Table 2, then edge 303 must have the value 
"6" minus "1"= "5." This analysis is similar to the analysis of inputs and outputs in circuit cfiagrams. The number of times 
40 the final instruction in the basic block is executed can be obtained directly from Table 2. The sum of the numerical anno- 
tations of the edges flowing out from a basic block must equal the number of times the final branch instruction of the 
basic block is executed. This analysis can be used when the prof Be data comprises absolute counts of instruction exe- 
cution or when the profile data comprises relative frequencies of execution. 

[0024] Because there is a direct correspondence between the intermediate-code instructions and the source-tan- 
45 guage statements of the routine "skewed," the annotations in FK3L 3 can be directly correlated with the source-language 
statements. For example, as can be seen in the annotated intermediate-code version of the routine "skewed," shown in 
Table 1, the branch on tine 15 is associated with the while statement on line 8 of the source-language version of 
"skewed." The five times that control fell through the branch instruction on One 1 5 to the "cmp" instruction on line 1 6 rep- 
resents the number of times the while-loop began iterating on line 10 of the source-language version of the routine 
so "skewed." The single time that the "bge" instruction on line 15 caused a branch to line 38, in Table 1, corresponds to 
termination of the whileAoop on fine 8 of the source-language version of the routine "skewed," and execution of the fol- 
lowing source-language statement on line 14 of the source-language version of the routine "skewed." 
[0025] Table 3, shown below, is yet another representation of the profile data shown above in Table 2. In Table 3, 
the first two columns correspond to the i ntermediate-code tine and the source-language line of each branch instruction 
55 in the intermediate-code version of the routine "skewed," shown in Table 1 , and the third and fourth columns represent 
the irrtermecfiate-code Bne and source-language line of the target instruction of the branch instruction specified in the 
f irst two columns. The fifth column contains a count of the number of times the branch instruction designated in columns 
one and two has executed to transfer control to the target specified in columns three and four. For example, the first line 
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in Table 3 indicates that the "bge" instruction on line 15 in Table 1 v corresponding to the wNieAooo in line 8 of the 
source-language version of the routine "skewed," shown above, branches to line 38 of Table 1, corresponding to 
source-language Gne 19, once during execution of the routine "skewed." Thus, armed with profile data as exemplified 
by Table 2, it is easy to either annotate the control flow graph, shown in FIG. 3, or create a branch instruction table, as 
5 shewn in Table 3, to provide a very clear indication of the number of times branch instructions within the intermecfiate- 
code version of the routine or control statements within the source-language version of the routine caused execution to 
resume at various corresponting target instructions or target source statements, respectively. 
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Optimized Example 

[0026] The following source-language version of the routine "skewed" is meant to illustrate how an optimizing com- 
piler might optimize the unoptimtzed routine "skewed." shown above Note that optimized source-language code is not 
so generated by an optimizing compiler and that the optimizations represented by this optimized source-language version 
of "skewed" would not necessarOy be produced by any particular optimizing compiler. This optimized source-language 
version of "skewed" is provided simply for illustrative purposes. 



55 
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1 int skewed (int 'array, int num) 

2 { 

3 int i, sum, mid, average, high, res; 
4 

5 res = -1; 



10 

6 if (num< 1)res = 0; 

7 else 

8 { 

9 i = 0; 

10 sum = 0; 

1 1 high = arraylnum - 1]; 

1 2 while (i < num) sum += array[i++]; 
20 13 average = sum / num; 

14 mid = (array[0] + high)/ 2; 

15 * if (high > 2 * mid jf high > 2 * average || 

16 high > 100 1| mid > 96) res = 1; 

17 else res = 0; 

18 } 

19 return res; 

20 } 



15 



25 



30 



35 



40 



45 



50 



55 



[0027] An assembly-language version of the optimized routine "skewed" is shown below: 
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1 skewed: 


sub sd so #32 






store Of so} r4 








4 




store ftfstri r6 


u 




stone 12fsol xl 


7 
* 




store Ififcrri rfi 


8 




store 2Qfso^ r9 


9 




store 24(sd) rIO 


10 




store 28(sp), r1 1 


11 




mov r9, #-1 


16 




cmp r3, #1 


17 




bge L1 


18 




drr9 


19 




brL5 


20 


L1- 


clrr4 


21 




drr5 


22 




mnv rtO r3 


23 




dec rIO 






IshftrtO #2 






Inad r7 rtOfr^ 


26 


12: 




27 




bge L3 


28 




mov MO, r4 


29 




Ishftrl0 t #2 


30 




load lil, r10(r2) 


31 




addr5, r5 t r11 


32 




incr4 


33 




brL2 


34 


L3: 


cfiv r8, r5, r3 
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35 load no, 0(r2) 

36 add i6, rIO, r7 

37 rshftr6.#1 

38 mov r10, 16 

39 ishft MO. #1 

40 cmp r7, (10 

41 bgtL4 

42 mov MO, r8 

43 Ishft MO. #1 

44 cmpr7, M1 

45 bgtL4 

46 cmp r7. #100 

47 bgtL4 

48 cmp r6, #96 

49 bgtL4 

50 mov r9, #0 

51 brL5 

52 L4: mov r9, #1 

53 L5: mov rO, r9 

54 load r4 t 0(sp) 

55 loadr5,4(sp) 

56 load r6, 8(sp) 

57 toad fl % 12(sp) 

58 load r8, 16(sp) 

59 load r9. 20(sp) 

60 load r10, 24(sp) 

61 k>adr11,28(sp) 

62 add sp v sp, #32 

63 jmp 0 



[0028] Comparisons of the unoptirrrized and optimized source-language versions of "skewed" and assembly-lan- 
guage versions of "skewed" will show that many statements and instructions have been eliminated, and the order of the 
statements and instructions has changed substantially. Table 4, below, shows a histogram of instruction execution, sim- 
ilar to Table 2, for execution of the optimized assembly-language version of "skewed" on the array shown in FIG. 2. 
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[0029] Table 5, shown below, is a branch table created from the prof ie data shown in Table 4, similar to Table 3. 
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TABLES 



5 



10 



15 



25 



branch 


target 


count 


assembly line 


source Bne 


assembly line 


source line 


IAJUIII 


151 




CO 
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27 




34 




1 
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07 




Zo 




5 


33 




26 






41 








0 






AO 




I 


45 




52 




n 

V 


45 




46 




1 


47 




52 




0 


47 




48 




1 


49 




52 




0 


49 




50 




1 


51 




53 




1 



[0030] Table 5 illustrates the problem of correlating profie data collected from the execution of optimized routines 
with source-language versions of those routines^ For example, the fourth entry in Table 5 indicates that a branch instruc- 
tion on Bne 27 of the optimi ze d assembly-language version of the routine branches once to a target instruction on line 

30 34. Referring to Table 1 , line 27 does not contain a branch instruction but instead contains a "cmp" instruction. Further- 
more, line 34. the target of this branch instruction, occurs within a baste block, but, by definition, the target of any branch 
instruction must be the first instruct i on of a basic block. The annotated unoptimized assembly4anguage version of the 
routine, shown in Table 1. is the only link between the optimized assembly-language version of the routine and the 
source-language routina Thus, columns two and four in Table 5 cannot be completed, as they were in Table 3. There 

35 is no direct way to relate the branch instructions of the optimized version of the routine "skewed" back to the unopfr- 
mized source-language version of the routine "skewed." Of course, if the illustrative optimized source-language routine, 
shewn above, were available, a correlation could be directly made. However, as pointed out above, the optimized 
source-language version of routines is not generated by optirni zin g compilers and is not available in order to assist the 
profile data correlation. 

40 

Method of the Present Invention 

[0031] Table 6. shown below, shows the optimized assembly-language version of the routine "skewed," shown 
above in the previous subsection, overlaid onto the basic block structure of Table 1 . 

45 
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TABLE 6 

-B1 



LLN: 1. 5-9 



1 skewed: 


subsp,sp.#32 


ALN: 


1 


2 


store 0(sp). r4 


ALN 


1 


3 


store 4(sp), r5 


ALN 


1 


4 


store 8(sp), r6 


ALN 


1 


6 


store 12(sp), r7 


ALN 


1 


7 


store 16(sp), r8 


ALN 


1 


8 


store 20(sp), r9 


ALN 


1 


9 


store 24(sp), r10 


ALN 


1 


10 


store 28(sp), ii1 


ALN 


1 


11 


mov r9, #-1 


ALN 


:5 



LLN: 8 




B2 

B3- 


LLN: 10 


16 
17 


cmp r3, #1 
bgeLI 

B4 


LLN: 12.13 


18 


drr9 



ALN: 10 
ALN: 10 



ALN: 12 



17 



EP1 004961 A2 



19 br L5 ALN:13 
B5 



LLN: 15 



-B6- 



LLN: 15 



-B7- 



LLN:16 20 L1: drr4 ALN: 6 

21 drr5 ALN: 7 

22 mpvii0,r3 ALN: 16 

23 decrlO ALN: 16 



-B8- 



LLN: 16 24 IshftrlO^ ALN: 16 

25 Ioadr7.r10(r2) ALN: 16 

B9 

LLN:17.18 26 L2: cmpr4,r3 ALN: 8 

27 bge L3 ALN: 8 

28 movr10 f r4 ALN: 17 

29 Ishftrt0.#2 ALN: 17 

30 Ioadr11,r10(r2) ALN: 17 

31 addr5.r5.r11 ALN: 17 

32 incr4 ALN: 17 

33 brL2 ALN: 18 
B10 



LLN: 19 



-B11- 



LLN: 21,22 34 L3: divr8,r5.r3 ALN: 21 

35 loadiiO, 0(r2) ALN: 15 

23 36 addr6.r10.r7 ALN: 22 

37 rshftr6,#1 ALN: 22 

38 movrlO.re ALN: 23 

39 Ishftr10,#1 ALN: 23 

40 cmp r7, liO ALN: 23 

41 bgt L4 ALN: 23 
B12 

LLN: 23 42 movrlO, r8 ALN: 23 

43 Ishftr10,#1 ALN: 23 

44 cmpr7.r11 ALN: 23 

45 bgt L4 ALN: 23 



LLN: 


24 


46 
47 


B13 

crop r7, #100 
bgtL4 

B14 


LLN: 


24 


48 
49 


crop r6, #96 
bgtL4 

B15 


LLN: 


25 


SO 
51 


movi9,#0 
brL5 

B16 



ALN: 24 
ALN: 24 

ALN: 24 
ALN: 24 

ALN: 25 
ALN: 25 



LLN: 24 52 L4: movr9.#1 ALN: 24 
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B17 





LLN: 27 53 


L5: mov rO. r9 


ALN: 27 


5 


54 


load r4. 0(sp) 


ALN: 27 




cc 
DO 


load ro, 4(sp) 


ALN. 27 




56 


toad r6. 8(sp) 


ALN: 27 




57 


load r7, 12(sp) 


ALN: 27 




58 


load r8, 16(sp) 


ALN: 27 


10 


59 


load rd, 20(sp) 


ALN: 27 




60 


load r10, 24(sp) 


ALN: 27 




61 


loadii1.28(sp) 


ALN: 27 




62 


add sp, sp, #32 


ALN: 27 


15 


63 


jmpO 


ALN: 27 



The technique for overlaying an optimized assembly-language program onto the template of the basic block structure 
of the unoptimized assernbty-language version of the program s descrfoed in US. Patent Na 5,713,010. That patent is 

so hereby incorporated by reference in its entirely. Essentially, the ALN associated with each instruction in the unoptimized 
version of the routine remains associated with the instruction in the optimized version of the routine, regarcfless of how 
the instruction is moved relative to other instructions or whether the instruction is duplicated. By contrast the LLNs 
associated with the basic blocks remain fixed. For example; consider a case where both a branch instruction and its tar- 
get are both moved during the optimization. One case of this phenomenon can be seen by comparing Tables 1 and 6. 

2s The "bge" instruction on One 15 of Table 1 corresponds to the "bge" instruction on One 27 of Table 6. The target of that 
branch instruction is line 38 in Table 1 but in Table 6. the target of the branch instruction is line 34. Using the LLNs and 
ALNs in fable 6, the original source-language location for the branch insertion can be reconstructed. The ALN of the 
"bge" instruction on line 27 of Table 6 is line 8. This means that the branch instruction corresponds to a statement in the 
unoptimized source-language version of the routine "skewed" on line 8. Inspection of the unoptimized source-language 

30 version of the routine "skewed" reveals that the branch instruction implements a portion of the while statement on line 
8. The target of the branch instruction on tine 27 of Table 6 is tine 34 of Table 6. Line 34 occurs in basic block "b11." 
That basic block is, in turn, associated with LLN 21. Inspection of the unoptimized source-language version of the rou- 
tine "skewed,'* above, reveals that fine 21 is where the average is calculated and is the first non-conditional statement 
following the wn/fe-toop comprising lines 8-18. Thus, by using the ALN as the source-language location of the branch 

35 instruction, and using the LLN of the basic block in which the target instruction is located, one can relate branch instruc- 
tions and the targets of branch instructions in the optimized assembly-language version of a routine back to the unop- 
timized source-language version of the routine. 

[0032] Table 7, below, shows the reconstruction of the unoptimized source-language locations of the branch and 
target instructions of the optimized assembly-language version of the routine "skewed." Thus, Table 7 corresponds to 

40 Table 5, shown above, but with the source line information completed by the process outlined above for each branch 
instruction. Note that the correspondence between Table 7 and the source-language version of the routine "skewed" is 
not necessanly perfect For example, the unoptimized source-language statement on fine 1 9 is removed during optimi- 
zation, and thus, in Table 7, there is no branch instruction corresponding to source line 19. However, for the critical con- 
trol statements, such as the »vn/7e-loop on line 8 of the unoptimized source-language version of the routine "skewed," a 

45 correlation can be easily constructed and entered into Table 7. 



TABLE 7 
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assembly line 


source line 


assembly line 


source line 


count 


17 


10 


20 


16 
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17 


10 


18 
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19 


13 


53 


27 
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27 
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TABLE 7 (continued) 



branch 


target 


assemHv tine 


source fine 


assembly line 


source line 


count 


33 


18 


26 


17 


5 


41 


23 


52 


24 


o 


41 


23 


42 


23 


1 


45 


23 


52 


24 


0 


45 


23 


46 


24 


1 


47 


24 


52 


24 


0 


47 


24 


48 


24 


1 


49 


24 


52 


24 


0 


49 


24 


50 


25 


1 


51 


25 


53 


27 


1 



Implementation 

[0033] The C++-like pseudo code, shown, below provides a straightforward implementation of the method of the 
present invention. 



1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 



class instruction 

Boolean branchO; 

class routine 

instruction *getJnstruction (int pc); 
intALN (intpc), 
int LLN (int pc); 

class arcTable 

arcTableEntry *getEntry (int branch, int target); 
arcTableEntry *newEntry (int branch, int target, int weight); 
int getNumEntries 0: 
arcTableEntry *getNthEntry (int n); 



20 



EP 1 004 961 A2 



5 



20 



15 



10 



20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 



{ 



storeArc (int pc, int nextPC, int weight = 1, arcTable & arcTab, routine & rtn) 
{ 

arcTableEntry *entry; 



int gefWetghtO; 

} 



class arcTableEntry 
{ 



if (rtn.getinstnictk>n(pc)->branchO) 



arcTableEntry (int branch, int target); 
void incCount (int weight); 
int getBranchO; 
int getTargetQ; 



entry = arcTab.getEntry (rtn.ALN(pc) ( rtn.LLN(nextPC)); 

if (entry 1= NULL) entry->incCount(weight); 

else arcTab.newEntry (rtnALN(pc), rtn.LLN(nextPC). weight); 



25 



[0034] The class "instruction," declared above on lines 1 -4, represents an instruction of an optimized assembly-lan- 
guage or machine-code version of a routine. Only one method, the method "branch/ is declared on line 3. This method 
30 returns the Boolean value TRUE if the instruction is a branch instruction, and returns the Boolean value FALSE if the 
instruction is not a branch instruction. 

[0035] The class "routine," declared above on lines 6-1 1 , represents an optimized, assembly-language or machine- 
code routine for which the run-time memory location is known. Three methods are declared within the class "routine" 
on Bnes 8-1 0: (1) the method "getlnstruction" that takes a memory address, or the contents of program counter, as an 
35 argument and returns a pointer to the instruction at that address; (2) the method "ALN" that takes a memory address, 
or the contents of program counter, as an argument and returns the ALN associated with the instruction at thai address; 
and (3) the method "LLN" that takes a memory address* or the contents of program counter, as an argument and 
returns the LLN associated with the basic block in which the instruction at that address resides. 
[0036] The class "arcTable/ declared above on fines 13-1 9, represents an arc or branch table similar to Tables 3, 
40 5, and 7, shown above in previous subsections, but having columns only for the source line numbers of the branch 
instructions and targets of the branch instructions and a relative frequency, or weight, for each arc. An arc table repre- 
sented by an instance of the class "arcTable" thus includes columns 2, 4, and 5 of the above-mentioned tables The 
class "arcTable" includes four methods declared on lines 15-18: (1) the method "get Entry" that takes the source line 
numbers of a branch and a target instruction as arguments and returns a pointer to the entry within the arc table repre- 
ss sented by an instance of the class "arcTable" having corresponding branch and target source line numbers, if there is 
such an entry, and a NULL pointer is there is no such entry; (2) the method "newEntry" that takes the source line num- 
bers of a branch and target and a weight, creates a new entry in the arc table represented by the an instance of class 
"arcTable" that includes the source line numbers and weight supplied in the arguments, and returns a pointer to the new 
entry; (3) the method "getNumEntries" that returns the number of entries in the arc table represented by an instance of 
so the class "arcTable;" and (4) the method "getNthErrtry" that returns a pointer to entry represented by the ordinal sup- 
plied as the argument, if such an entry exists, and a NULL pointer if no such entry exists. 

[0037] The class "arcTableEntry," declared above on lines 21-28, represents a single entry in an arc table repre- 
sented by an instance of the class "arcTable," discussed above. The methods shown for the class "arcTableErrtry" 
include: (1) the constructor "arcTableEntry;" (2) the method IncCount" that increments the count, or weight associated 
55 with the entry by the number supplied in the argument "weight" (3) and three methods "getBranch," "getTarget," and 
"getWetght" that return the source Gne numbers and count contained within the arc table entry represented by an 
instance of the class "arcTableEntry." 

[0038] It should be noted that irnplementations of the methods of the above-described classes are not shown. 
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These methods can be implemented in many different ways, and the implementations are quite straightforward. The 
declarations of classes are gpven to facilitate understand ng of the routine "storeArc" irrplemerrted above on lines 30- 
40. 

[0039] The routine "storeArc" may be called during profile data collection conducted on an executing optimized 
5 machine-code program or routine to build up an arc table representing the frequency of execution of branches within 
the source-language version of the program or routine; as discussed above in the previous subsection. The routine 
"storeArc" may be caDed, for example, at each point during proffle data collection when an instruction is trapped or the 
program counter is sampled. The routine "storeArc" takes 5 arguments: (1) "pa* the memory address of a branch 
instruction; (2) "nextPC," the memory address of the target instruction of the branch instruction referred to by the first 
10 argument; (3) \veight" the weight, or count to assign to instructions or to incre- 

ment the weight or count associated with an arc that represents the branch and target instructions; (4) a reference 
"arcTab" to an instance erf the class "arcTable" in which proffle data is being accumulated and (5) a reference "rtn" to an 
instance of the class "routine" that represents a routine that is being profiled. 

[0040] The routine "storeArc" determines, on fine 34, whether the argument "pc" refers to a branch instruction, tf so, 
is then the routine "storeArc" calls, on line 36, the "getErrtry" method of the instance of the class "arcTable" referred to by 
the reference "arcTabf using the methods "ALN" and "LLN" of the class "routine" to generate the source fine numbers 
of the branch and target instructions supplied as arguments "pc" and "nextPC," respectively, and places the result of the 
call into the variable "entry." If an entry exists in arcTab for the source line numbers of the branch and target instructions, 
as determined by the routine "storeArc" on line 37, then the count, or weight, associated with that entry is incremented 
20 on line 37 to accumtiate the instruction trap or program counter sampling that generated the call to the routine 
"storeArc." If no entry exists, then, on line 38, a new entry is created in the arc table represented by the instance of the 
class "arcTable" referred to by the reference "arcTab." Thus, by calfing the routine "storeArc" during profiling of an exe- 
cuting optimized machine-code version of a routine or program, an arc table similar to Table 7 can be compiled to rep- 
resent a proffle of the frequency of execution of control statements within the source-language version of the routine or 
25 program. 

[0041] Although the present invention has been descrtoed in terms of preferred embodiments, rt is not intended that 
the invention be limited to these embodiments. Modifications within the spirit of the invention will be apparent to those 
skilled in the art For example, various additional types of data may be collected and stored in an arc table, and many 
different types of formats for arc tables may be employed. The present invention may be implemented in any number of 

30 different languages on any number of different types of computer systems. The prof Qe data can also be stored in a data- 
base management system. Various techniques may be employed for determining and storing logical line numbers and 
actual line numbers along with the instructions of an optimized routine or program. Rather than creating an arc table, 
the technique of the present invention may be used to directly relate optimized routine or program instructions back to 
source language code in order to produce annotated source files or source line profBe histograms. 

35 [0042] The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough 
understanding of the invention. However, it will be apparent to one skilled in the art that the specific details are not 
required in order to practice the invention. The foregoing descriptions of specific embodiments of the present invention 
are presented for purpose of illustration and desenption. They are not intended to be exhaustive or to limit the invention 
to the precise forms cfiscfosed. Obviously many mocfifications and variations are possible in view of the above teach- 

40 ings. The embocSments are shown and described in order to best explain the principles of the invention and its practical 
applications, to thereby enable others skilled in the art to best utilize the invention and various ernbodimerrts with vari- 
ous modifications as are suited to the particular use contemplated, ft is intended that the scope of the invention be 
defined by the following claims and their equivalents: 

45 Claims 

1 . A method for relating profie data (700) collected during execution of an optimized computer program back to the 
source-language description of the computer program (400), the optimized computer program including actual line 
numbers (602) associated with each intermediate code instruction and logical line numbers (604) associated with 
so each basic block (606-608) within an intermecfiate code program (600), the method comprising : 

accumulating profile data (700) by storing and incrementing counts (802) associated with execution of branch 
instructions (61 0, 614), each execution of a branch instruction transferring control to a particular target instruc- 
tion (61 2, 61 6), and each count (802) associated with one particular branch instruction (61 0) and one particular 
55 target instruction (612); and 

relating each accumulated count (802) to the actual line number (804) of the branch instruction associated with 
the count and the logpeal fine number (806) of the basic block that contains the target instruction associated 
with the count 
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2. The method of claim 1 wherein each accumulated count (802) is related to the actual line number (804) of the 
branch instruction associated with the count and the logical line number (806) of the basic block that contains the 
target instruction associated with the court during the accumulation of prof3e data (700), and wherein the method 
further includes: 

5 

creating an arc table (902, 802, 804, 806) to represent the accumulated counts (802) related to the actual Gne 
number (804) of the branch instruction associated with the count and the logical line number (806) of the basic 
block that contains the target instr u ction associated with the count each entry of the arc table inducting the 
actual line number associated with a branch instruction, the logical line number of the basic block that contains 
w a target instruction of the branch instruction, and the count that represents the frequency of execution of the 

branch instruction resulting in transfer of control to the target instruction. 

3w The method of claim 1 wherein the profile data (700) is accumulated by a technique selected from among of the 
following techniques: 

15 

by trapping execution of branch instructions (610) during execution of the optimized computer program (600) 
and incrementing the count (802) associated with the branch instruction (610) and the target instruction (612) 
to which the branch instruction transfers control; 

by interacting execution times and accumulating counts indicating the relative 

20 numbers of times different instructions are executed; and 

by accessing information gathered at run-time during execution of a program (600) by branch prediction hard- 
ware incorporated within a processor that executes the program. 

4. The method of claim 1 wherein the profile data (700) is accumulated during dynamic translation of source code 
25 (400) that includes annotations to drect the translator to collect particular profBe data. 

5. The method of claim 1 further including: 

using the accumulated counts (802) related back to the actual line numbers (804) of the branch instructions 
30 associated with the counts and the logical fine numbers (806) of the basic blocks that contain the target instruc- 

tions associated with the counts to annotate the source-language (400) description of the computer program. 

6. A system (900) for proffling execution of an optimized computer program, the system comprising: 

35 a computer system on which the optimized computer program is executed; 

the optimized computer program (600) stored within the computer system, each instruction (610, 612, 614, 
616) of the optimized computer program associated with an actual Gne number (602), and each basic block 
(606, 607, 608) of the optimized computer program associated with a number of logical line numbers (604); 
and 

40 a profiling component that monitors execution of the optimized computer program, accumulates relative fre- 

quencies with which each branch instruction (610, 614) transfers control to target instructions (612, 616), and 
relates the accumulated relative frequencies with which branch instructions transfer control to target instruc- 
tions back to corresponding source language statements. 

45 7. The system of claim 6 wherein the profiing component relates the accumulated relative frequency with which a 
branch instruction (610) transfers control to a target instruction (61 2) back to corresponding source language state- 
ments (400) by associating the accumulated relative frequency with the actual line number (804) associated with 
the branch instruction and the logical line number (806) associated with the basic block that contains the target 
instruction. 

50 

8. The system of claim 7 wherein the profiling component creates an arc table (902, 802, 804, 806) to represent the 
accumulated relative frequencies, each entry of the arc table including the actual line number (804) associated with 
a branch instruction (610, 614), the logical line number (806) of the basic block that contains a target instruction 
(61 2, 61 6) of the branch instruction, and the accumulates relative frequency of execution of the branch instruction 

55 resulting in transfer of control to the target instruction. 

9. The system of claim 8 wherein the prof fling component uses the accumulated relative frequencies with which 
branch instructions (610. 614) transfer control to target instructions (612, 616) related back to corresponding 
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source language statements to annotate a source-language description (400) of the computer program. 
10. The system of claim 6 wherein the profiling component further comprises: 

a monitoring component that monitors execution of the optimized computer pro-am (600) and accumulates 
relative frequencies with which each branch instruction (610, 614) transfers control to target in structions (612, 
616); and 

a source relating component that relates the accumulated relative frequencies with which branch instructions 
(61 0. 61 4) transfer control to target instructions (61 2, 61 6) back to corresponding source language statements 
(400). 
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